74 research outputs found

    Bimanual robot skills: MP encoding, dimensionality reduction and reinforcement learning

    Get PDF
    Aplicat embargament des de la data de defensa fins 1/7/2018Premio a la mejor Tesis Doctoral sobre Robótica, Edición 2017, atorgat pel Comité Español de Automática.Finalista del 2018 George Girault PhD Award, from EuRoboticsIn our culture, robots have been in novels and cinema for a long time, but it has been specially in the last two decades when the improvements in hardware - better computational power and components - and advances in Artificial Intelligence (AI), have allowed robots to start sharing spaces with humans. Such situations require, aside from ethical considerations, robots to be able to move with both compliance and precision, and learn at different levels, such as perception, planning, and motion, being the latter the focus of this work. The first issue addressed in this thesis is inverse kinematics for redundant robot manipulators, i.e: positioning the robot joints so as to reach a certain end-effector pose. We opt for iterative solutions based on the inversion of the kinematic Jacobian of a robot, and propose to filter and limit the gains in the spectral domain, while also unifying such approach with a continuous, multipriority scheme. Such inverse kinematics method is then used to derive manipulability in the whole workspace of an antropomorphic arm, and the coordination of two arms is subsequently optimized by finding their best relative positioning. Having solved the kinematic issues, a robot learning within a human environment needs to move compliantly, with limited amount of force, in order not to harm any humans or cause any damage, while being as precise as possible. Therefore, we developed two dynamic models for the same redundant arm we had analysed kinematically: The first based on local models with Gaussian projections, and the second characterizing the most problematic term of the dynamics, namely friction. Such models allowed us to implement feed-forward controllers, where we can actively change the weights in the compliance-precision tradeoff. Moreover, we used such models to predict external forces acting on the robot, without the use of force sensors. Afterwards, we noticed that bimanual robots must coordinate their components (or limbs) and be able to adapt to new situations with ease. Over the last decade, a number of successful applications for learning robot motion tasks have been published. However, due to the complexity of a complete system including all the required elements, most of these applications involve only simple robots with a large number of high-end technology sensors, or consist of very simple and controlled tasks. Using our previous framework for kinematics and control, we relied on two types of movement primitives to encapsulate robot motion. Such movement primitives are very suitable for using reinforcement learning. In particular, we used direct policy search, which uses the motion parametrization as the policy itself. In order to improve the learning speed in real robot applications, we generalized a policy search algorithm to give some importance to samples yielding a bad result, and we paid special attention to the dimensionality of the motion parametrization. We reduced such dimensionality with linear methods, using the rewards obtained through motion repetition and execution. We tested such framework in a bimanual task performed by two antropomorphic arms, such as the folding of garments, showing how a reduced dimensionality can provide qualitative information about robot couplings and help to speed up the learning of tasks when robot motion executions are costly.A la nostra cultura, els robots han estat presents en novel·les i cinema des de fa dècades, però ha sigut especialment en les últimes dues quan les millores en hardware (millors capacitats de còmput) i els avenços en intel·ligència artificial han permès que els robots comencin a compartir espais amb els humans. Aquestes situacions requereixen, a banda de consideracions ètiques, que els robots siguin capaços de moure's tant amb suavitat com amb precisió, i d'aprendre a diferents nivells, com són la percepció, planificació i moviment, essent l'última el centre d'atenció d'aquest treball. El primer problema adreçat en aquesta tesi és la cinemàtica inversa, i.e.: posicionar les articulacions del robot de manera que l'efector final estigui en una certa posició i orientació. Hem estudiat el camp de les solucions iteratives, basades en la inversió del Jacobià cinemàtic d'un robot, i proposem un filtre que limita els guanys en el seu domini espectral, mentre també unifiquem tal mètode dins un esquema multi-prioritat i continu. Aquest mètode per a la cinemàtica inversa és usat a l'hora d'encapsular tota la informació sobre l'espai de treball d'un braç antropomòrfic, i les capacitats de coordinació entre dos braços són optimitzades, tot trobant la seva millor posició relativa en l'espai. Havent resolt les dificultats cinemàtiques, un robot que aprèn en un entorn humà necessita moure's amb suavitat exercint unes forces limitades per tal de no causar danys, mentre es mou amb la màxima precisió possible. Per tant, hem desenvolupat dos models dinàmics per al mateix braç robòtic redundant que havíem analitzat des del punt de vista cinemàtic: El primer basat en models locals amb projeccions de Gaussianes i el segon, caracteritzant el terme més problemàtic i difícil de representar de la dinàmica, la fricció. Aquests models ens van permetre utilitzar controladors coneguts com "feed-forward", on podem canviar activament els guanys buscant l'equilibri precisió-suavitat que més convingui. A més, hem usat aquests models per a inferir les forces externes actuant en el robot, sense la necessitat de sensors de força. Més endavant, ens hem adonat que els robots bimanuals han de coordinar els seus components (braços) i ser capaços d'adaptar-se a noves situacions amb facilitat. Al llarg de l'última dècada, diverses aplicacions per aprendre tasques motores robòtiques amb èxit han estat publicades. No obstant, degut a la complexitat d'un sistema complet que inclogui tots els elements necessaris, la majoria d'aquestes aplicacions consisteixen en robots més aviat simples amb costosos sensors d'última generació, o a resoldre tasques senzilles en un entorn molt controlat. Utilitzant el nostre treball en cinemàtica i control, ens hem basat en dos tipus de primitives de moviment per caracteritzar la motricitat robòtica. Aquestes primitives de moviment són molt adequades per usar aprenentatge per reforç. En particular, hem usat la búsqueda directa de la política, un camp de l'aprenentatge per reforç que usa la parametrització del moviment com la pròpia política. Per tal de millorar la velocitat d'aprenentatge en aplicacions amb robots reals, hem generalitzat un algoritme de búsqueda directa de política per a donar importància a les mostres amb mal resultat, i hem donat especial atenció a la reducció de dimensionalitat en la parametrització dels moviments. Hem reduït la dimensionalitat amb mètodes lineals, utilitzant les recompenses obtingudes EN executar els moviments. Aquests mètodes han estat provats en tasques bimanuals com són plegar roba, usant dos braços antropomòrfics. Els resultats mostren com la reducció de dimensionalitat pot aportar informació qualitativa d'una tasca, i al mateix temps ajuda a aprendre-la més ràpid quan les execucions amb robots reals són costoses.Award-winningPostprint (published version

    Bimanual robot skills: MP encoding, dimensionality reduction and reinforcement learning

    Get PDF
    In our culture, robots have been in novels and cinema for a long time, but it has been specially in the last two decades when the improvements in hardware - better computational power and components - and advances in Artificial Intelligence (AI), have allowed robots to start sharing spaces with humans. Such situations require, aside from ethical considerations, robots to be able to move with both compliance and precision, and learn at different levels, such as perception, planning, and motion, being the latter the focus of this work. The first issue addressed in this thesis is inverse kinematics for redundant robot manipulators, i.e: positioning the robot joints so as to reach a certain end-effector pose. We opt for iterative solutions based on the inversion of the kinematic Jacobian of a robot, and propose to filter and limit the gains in the spectral domain, while also unifying such approach with a continuous, multipriority scheme. Such inverse kinematics method is then used to derive manipulability in the whole workspace of an antropomorphic arm, and the coordination of two arms is subsequently optimized by finding their best relative positioning. Having solved the kinematic issues, a robot learning within a human environment needs to move compliantly, with limited amount of force, in order not to harm any humans or cause any damage, while being as precise as possible. Therefore, we developed two dynamic models for the same redundant arm we had analysed kinematically: The first based on local models with Gaussian projections, and the second characterizing the most problematic term of the dynamics, namely friction. Such models allowed us to implement feed-forward controllers, where we can actively change the weights in the compliance-precision tradeoff. Moreover, we used such models to predict external forces acting on the robot, without the use of force sensors. Afterwards, we noticed that bimanual robots must coordinate their components (or limbs) and be able to adapt to new situations with ease. Over the last decade, a number of successful applications for learning robot motion tasks have been published. However, due to the complexity of a complete system including all the required elements, most of these applications involve only simple robots with a large number of high-end technology sensors, or consist of very simple and controlled tasks. Using our previous framework for kinematics and control, we relied on two types of movement primitives to encapsulate robot motion. Such movement primitives are very suitable for using reinforcement learning. In particular, we used direct policy search, which uses the motion parametrization as the policy itself. In order to improve the learning speed in real robot applications, we generalized a policy search algorithm to give some importance to samples yielding a bad result, and we paid special attention to the dimensionality of the motion parametrization. We reduced such dimensionality with linear methods, using the rewards obtained through motion repetition and execution. We tested such framework in a bimanual task performed by two antropomorphic arms, such as the folding of garments, showing how a reduced dimensionality can provide qualitative information about robot couplings and help to speed up the learning of tasks when robot motion executions are costly.A la nostra cultura, els robots han estat presents en novel·les i cinema des de fa dècades, però ha sigut especialment en les últimes dues quan les millores en hardware (millors capacitats de còmput) i els avenços en intel·ligència artificial han permès que els robots comencin a compartir espais amb els humans. Aquestes situacions requereixen, a banda de consideracions ètiques, que els robots siguin capaços de moure's tant amb suavitat com amb precisió, i d'aprendre a diferents nivells, com són la percepció, planificació i moviment, essent l'última el centre d'atenció d'aquest treball. El primer problema adreçat en aquesta tesi és la cinemàtica inversa, i.e.: posicionar les articulacions del robot de manera que l'efector final estigui en una certa posició i orientació. Hem estudiat el camp de les solucions iteratives, basades en la inversió del Jacobià cinemàtic d'un robot, i proposem un filtre que limita els guanys en el seu domini espectral, mentre també unifiquem tal mètode dins un esquema multi-prioritat i continu. Aquest mètode per a la cinemàtica inversa és usat a l'hora d'encapsular tota la informació sobre l'espai de treball d'un braç antropomòrfic, i les capacitats de coordinació entre dos braços són optimitzades, tot trobant la seva millor posició relativa en l'espai. Havent resolt les dificultats cinemàtiques, un robot que aprèn en un entorn humà necessita moure's amb suavitat exercint unes forces limitades per tal de no causar danys, mentre es mou amb la màxima precisió possible. Per tant, hem desenvolupat dos models dinàmics per al mateix braç robòtic redundant que havíem analitzat des del punt de vista cinemàtic: El primer basat en models locals amb projeccions de Gaussianes i el segon, caracteritzant el terme més problemàtic i difícil de representar de la dinàmica, la fricció. Aquests models ens van permetre utilitzar controladors coneguts com "feed-forward", on podem canviar activament els guanys buscant l'equilibri precisió-suavitat que més convingui. A més, hem usat aquests models per a inferir les forces externes actuant en el robot, sense la necessitat de sensors de força. Més endavant, ens hem adonat que els robots bimanuals han de coordinar els seus components (braços) i ser capaços d'adaptar-se a noves situacions amb facilitat. Al llarg de l'última dècada, diverses aplicacions per aprendre tasques motores robòtiques amb èxit han estat publicades. No obstant, degut a la complexitat d'un sistema complet que inclogui tots els elements necessaris, la majoria d'aquestes aplicacions consisteixen en robots més aviat simples amb costosos sensors d'última generació, o a resoldre tasques senzilles en un entorn molt controlat. Utilitzant el nostre treball en cinemàtica i control, ens hem basat en dos tipus de primitives de moviment per caracteritzar la motricitat robòtica. Aquestes primitives de moviment són molt adequades per usar aprenentatge per reforç. En particular, hem usat la búsqueda directa de la política, un camp de l'aprenentatge per reforç que usa la parametrització del moviment com la pròpia política. Per tal de millorar la velocitat d'aprenentatge en aplicacions amb robots reals, hem generalitzat un algoritme de búsqueda directa de política per a donar importància a les mostres amb mal resultat, i hem donat especial atenció a la reducció de dimensionalitat en la parametrització dels moviments. Hem reduït la dimensionalitat amb mètodes lineals, utilitzant les recompenses obtingudes EN executar els moviments. Aquests mètodes han estat provats en tasques bimanuals com són plegar roba, usant dos braços antropomòrfics. Els resultats mostren com la reducció de dimensionalitat pot aportar informació qualitativa d'una tasca, i al mateix temps ajuda a aprendre-la més ràpid quan les execucions amb robots reals són costoses

    Dimensionality reduction and motion coordination in learning trajectories with dynamic movement primitives

    Get PDF
    Trabajo presentado al IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), celebrado en Chicago, Illinois (US) del 14 al 18 de septiembre.Dynamic Movement Primitives (DMP) are nowadays widely used as movement parametrization for learning trajectories, because of their linearity in the parameters, rescaling robustness and continuity. However, when learning a movement with a robot using DMP, many parameters may need to be tuned, requiring a prohibitive number of experiments/simulations to converge to a solution with a locally or globally optimal reward. We propose here strategies to palliate this dimensionality problem: the first is to explore only along the most significant directions in the parameter space, and the second is to add a reduced second set of Gaussians that would optimize the trajectory after fixing the Gaussians approximating the demonstrated movement. Both strategies result in less Gaussian computations and better performance on learning algorithms. To further speed up the learning and allow for a better biased exploration, we also propose to coordinate the motion of different joints, by computing a coordination matrix initialized with the demonstrated movement and then automatically updating it by eliminating the degrees of freedom least affecting task performance. Our three proposals have been experimentally tested and the obtained results show that similar (or even better) performance can be obtained at a significantly lower computational cost by reducing the dimensionality of the exploration space.This work is partially funded by EU Project IntellAct (FP7-269959) and by the Spanish Ministry of Science and Innovation under project PAU+DPI2011-27510 and by the CSIC Project CINNOVA (201150E088). A. Colomé is also supported by the Spanish Ministry of Education, Culture and Sport via a FPU doctoral grant (AP2010-1989).Peer Reviewe

    Closed-loop inverse kinematics for redundant robots: Comparative assessment and two enhancements

    Get PDF
    Motivated by the need of a robust and practical Inverse Kinematics (IK) algorithm for the WAM robot arm, we reviewed the most used Closed-Loop IK (CLIK) methods for redundant robots, analysing their main points of concern: convergence, numerical error, singularity handling, joint limit avoidance, and the capability of reaching secondary goals. As a result of the experimental comparison, we propose two enhancements. The first is a new filter for the singular values of the Jacobian matrix that guarantees that its conditioning remains stable, while none of the filters found in literature is successful at doing so. The second is to combine a continuous task priority strategy with selective damping to generate smoother trajectories. Experimentation on the WAM robot arm shows that these two enhancements yield an IK algorithm that improves on the reviewed state-of-the-art ones, in terms of the good compromise it achieves between time step length, Jacobian conditioning, multiple task performance, and computational time, thus constituting a very solid option in practice. This proposal is general and applicable to other redundant robots.This research is partially funded by the CSIC project CINNOVA (201150E088) and the Catalan grant 2009SGR155. A. Colomé is also supported by the Spanish Ministry of Education, Culture and Sport via a FPU doctoral grant (AP2010-1989).Peer Reviewe

    Dual REPS: a generalization of relative entropy policy search exploiting bad experiences

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Policy search (PS) algorithms are widely used for their simplicity and effectiveness in finding solutions for robotic problems. However, most current PS algorithms derive policies by statistically fitting the data from the best experiments only. This means that experiments yielding a poor performance are usually discarded or given too little influence on the policy update. In this paper, we propose a generalization of the relative entropy policy search (REPS) algorithm that takes bad experiences into consideration when computing a policy. The proposed approach, named dual REPS (DREPS) following the philosophical interpretation of the duality between good and bad, finds clusters of experimental data yielding a poor behavior and adds them to the optimization problem as a repulsive constraint. Thus, considering that there is a duality between good and bad data samples, both are taken into account in the stochastic search for a policy. Additionally, a cluster with the best samples may be included as an attractor to enforce faster convergence to a single optimal solution in multimodal problems. We first tested our proposed approach in a simulated reinforcement learning setting and found that DREPS considerably speeds up the learning process, especially during the early optimization steps and in cases where other approaches get trapped in between several alternative maxima. Further experiments in which a real robot had to learn a task with a multimodal reward function confirm the advantages of our proposed approach with respect to REPS.Peer ReviewedPostprint (author's final draft

    Dual REPS: a generalization of relative entropy policy search exploiting bad experiences

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Policy search (PS) algorithms are widely used for their simplicity and effectiveness in finding solutions for robotic problems. However, most current PS algorithms derive policies by statistically fitting the data from the best experiments only. This means that experiments yielding a poor performance are usually discarded or given too little influence on the policy update. In this paper, we propose a generalization of the relative entropy policy search (REPS) algorithm that takes bad experiences into consideration when computing a policy. The proposed approach, named dual REPS (DREPS) following the philosophical interpretation of the duality between good and bad, finds clusters of experimental data yielding a poor behavior and adds them to the optimization problem as a repulsive constraint. Thus, considering that there is a duality between good and bad data samples, both are taken into account in the stochastic search for a policy. Additionally, a cluster with the best samples may be included as an attractor to enforce faster convergence to a single optimal solution in multimodal problems. We first tested our proposed approach in a simulated reinforcement learning setting and found that DREPS considerably speeds up the learning process, especially during the early optimization steps and in cases where other approaches get trapped in between several alternative maxima. Further experiments in which a real robot had to learn a task with a multimodal reward function confirm the advantages of our proposed approach with respect to REPS.Peer ReviewedPostprint (author's final draft

    Demonstration-free contextualized probabilistic movement primitives, further enhanced with obstacle avoidance

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Movement Primitives (MPs) have been widely used over the last years for learning robot motion tasks with direct Policy Search (PS) reinforcement learning. Among them, Probabilistic Movement Primitives (ProMPs) are a kind of MP based on a stochastic representation over sets of trajectories, which benefits from the properties of probability operations. However, the generation of such ProMPs requires a set of demonstrations to capture motion variability. Additionally, using context variables to modify trajectories coded as MPs is a popular approach nowadays in order to adapt motion to environmental variables. This paper proposes a contextual representation of ProMPs that allows for an easy adaptation to changing situations through context variables, by reparametrizing motion with them. Moreover, we propose a way of initializing contextual trajectories without the need of real robot demonstrations, by setting an initial position, a final position, and a number of trajectory interest points, where the contextual variables are evaluated. The parametrizations obtained show to be accurate while relieving the user from the need of performing costly computations such as conditioning. Additionally, using this contextual representation, we propose a simple yet effective quadratic optimization-based obstacle avoidance method for ProMPs. Experiments in simulation and on a real robot show the promise of the approach.Peer ReviewedPostprint (author's final draft

    Dimensionality reduction in learning Gaussian mixture models of movement primitives for contextualized action selection and adaptation

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Robotic manipulation often requires adaptation to changing environments. Such changes can be represented by a certain number of contextual variables that may be observed or sensed in different manners. When learning and representing robot motion –usually with movement primitives–, it is desirable to adapt the learned behaviors to the current context. Moreover, different actions or motions can be considered in the same framework, using contextualization to decide which action applies to which situation. Such frameworks, however, may easily become large-dimensional, thus requiring to reduce the dimensionality of the parameters space, as well as the amount of data needed to generate and improve the model over experience. In this paper, we propose an approach to obtain a generative model from a set of actions that share a common feature. Such feature, namely a contextual variable, is plugged into the model to generate motion. We encode the data with a Gaussian Mixture Model in the parameter space of Probabilistic Movement Primitives (ProMPs), after performing Dimensionality Reduction (DR) on such parameter space. We append the contextual variable to the parameter space and obtain the number of Gaussian components, i.e., different actions in a dataset, through Persistent Homology. Then, using multimodal Gaussian Mixture Regression (GMR), we can retrieve the most likely actions given a contextual situation and execute them. After actions are executed, we use Reward-Weighted Responsibility GMM (RWR-GMM) update the model after each execution. Experimentation in 3 scenarios shows that the method drastically reduces the dimensionality of the parameter space, thus implementing both action selection and adaptation to a changing situation in an efficient way.Peer ReviewedPostprint (author's final draft

    Dimensionality reduction for dynamic movement primitives and application to bimanual manipulation of clothes

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Dynamic Movement Primitives (DMPs) are nowadays widely used as movement parametrization for learning robot trajectories, because of their linearity in the parameters, rescaling robustness and continuity. However, when learning a movement with DMPs, a very large number of Gaussian approximations needs to be performed. Adding them up for all joints yields too many parameters to be explored when using Reinforcement Learning (RL), thus requiring a prohibitive number of experiments/simulations to converge to a solution with a (locally or globally) optimal reward. In this paper we address the process of simultaneously learning a DMP-characterized robot motion and its underlying joint couplings through linear Dimensionality Reduction (DR), which will provide valuable qualitative information leading to a reduced and intuitive algebraic description of such motion. The results in the experimental section show that not only can we effectively perform DR on DMPs while learning, but we can also obtain better learning curves, as well as additional information about each motion: linear mappings relating joint values and some latent variables.Peer ReviewedPostprint (author's final draft

    Positioning two redundant arms for cooperative manipulation of objects

    Get PDF
    Bimanual manipulation of objects is receiving a lot of attention nowadays, but there is few literature addressing the design of the arms configuration. In this paper, we propose a way to analyze the relative positioning of two redundant arms, both equipped with spherical wrists, in order to obtain the best common workspace for grasping purposes. Considering the geometry of a robot with a spherical wrist, the Cartesian workspace can be discretized, with an easy representation of the feasible end-effector orientations at each point using bounding cones. After having characterized the workspace for one robot arm, we can evaluate how good each of the discretized poses relate with an identical arm in another position with a quality function that considers orientations. In the end, we obtain a quality value for each relative position of two arms, and we perform an optimization using genetic algorithms to obtain the best workspace for a cooperative task.Peer ReviewedPostprint (author’s final draft
    • …
    corecore